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PATENT 

Attorney Docket No.: 16869Q-106000US 
Client Reference No.: HSJ8-2003-0361 

TECHNIQUES FOR DETECTING AND CORRECTING ERRORS USING 
MULTIPLE INTERLEAVE ERASURE POINTERS 

BACKGROUND OF THE INVENTION 
5 [0001] The present invention relates to techniques for detecting and correcting errors on 

storage media, and more particularly, to techniques for detecting and correcting errors on storage 
media using multiple interleave erasure pointers in a multi-level integrated interleave format. 

[0002] Data bytes are stored on magnetic media in groups referred to as codewords (sectors). 
Typically, 512 data bytes are stored in each codeword. The codewords of data bytes may include 
10 multiple interleaved words (i.e., data byte strings or vectors) and corresponding check bytes that 
are byte interleaved. 

[0003] Multibyte errors are detected and corrected in long byte strings recorded on a storage 
medium in blocks. Each block comprises a plurality of codewords and a plurality of block-level 
check bytes derived from the codewords. Each codeword includes data bytes and codeword 
15 check bytes mapped from a plurality of equal-length data byte strings according to a linear error 
correction code. 

[0004] Each block is generated and recorded by logically summing the data byte strings and by 
mapping the logical sum and the data byte strings into counterpart codewords. The counter part 
codewords include codeword check bytes in accordance with the same linear error correction 
20 code. Next, the codewords are logically summed. The codewords and their logical sum are 
interleaved in a predetermined pattern prior to being recorded on a storage device or the like. 

[0005] Additional details of a technique for detecting and correcting error bytes is discussed in 
U.S. Patent 5,946,328, which is incorporated by reference herein. Three data streams mi(x), 
m2(x), and m3(x) are used as an example. Two of the data streams mi(x) and m2(x) are encoded 
25 by a first Reed-Solomon (RS) linear encoder that produces respective codewords Ci(x) and C2(x). 
This first RS encoder appends 2ti check bytes to each of the codewords. 

[0006] The third datastream m3(x) is modified to form the logical (modulo 2) sum of 
mi(x)+m2(x)+m3(x) prior to encoding by a second RS encoder. This second RS encoder includes 



2ti +2t 2 check bytes within a codeword c'(x) of the logically-summed datastream. The codeword 
C3(x) represents the logical sum of the three codewords ci(x)+c 2 (x)+c'(x). The third codeword 
c 3 (x) contains 2t 2 shared block check bytes and 2ti individual check bytes when generated in this 
manner. The block check bytes are inside and an intrinsic part of the RS codeword. 

5 [0007] Two of the codewords Ci(x) and c 2 (x) are generated by linear error correction encoding 
of respective data byte strings mi(x) and m 2 (x). Codewords Ci(x) and c 2 (x) require 2ti check 
bytes. The codeword c'(x) are also generated by the linear encoding of the modulo 2 sum of 
mi(x), m 2 (x), and m 3 (x). Codeword c'(x) requires 2ti + 2t 2 check bytes. Data byte strings mi(x) 
and m 2 (x) are appended with 2t 2 zeroes denoted by q>(x) in order to secure equal codeword 
10 length. The codeword outputs may then be expressed as: 

d(x) = x 2tI + 2t2 mj (x)+ <p (x)+n (x) 

c 2 (x)=x 2tI+2t2 m 2 (x)+ <p (x)+r 2 (x) 

c'(x) =x 2tI+2l2 [mj (x)+m 2 (x)+m 5 (x)J +r(x) 

[0008] The codeword c'(x) is further processed to produce a modified and third codeword c 3 
15 (x) by summing the three codewords Ci (x), c 2 (x), and c f (x) modulo 2 such that: 

c 3 (x)=[cr(x)+c 2 (x)]+c'(x)=x 2tl + 2t2 m 3 (x)+r B (x)+r 3 (x) 

[0009] The check bytes r B (x) are the block check bytes shared by mi (x), m 2 (x), and m 3 (x), 
whereas r 3 (x) are the individual check bytes of data stream m 3 (x). 

[0010] The check bytes ri(x), r 2 (x), and r 3 (x) are first level check bytes. These check bytes can 
20 only be used to correct ti errors in data bytes within codewords Ci(x), c 2 (x), c 3 (x), respectively. 
The check bytes r B (x) in codeword c 3 (x) are second (block) level check bytes that can be used to 
correct ti + 1 2 data byte errors in any one of codewords ci(x), c 2 (x), and c 3 (x). 

[0011] Following these computations, an integrated interleaved block of codewords ci (x) 3 c 2 
(x), and c 3 (x) is written out to the disk. Subsequently, when the disk must execute a read or read 
25 modify write command or the like, an addressed block or blocks of codewords are streamed from 
their track locations on the disk. Any errors on the disk are detected and corrected on the fly 
based on the syndrome processing of the codewords. The syndromes are derived from the 
purported codewords and their logical sum modulo 2. A non-zero syndrome indicates an error 
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byte. Any nonzero syndromes are identified and processed over the codewords to correct the 
bytes in error. The updated block-level nonzero syndromes are also processed to locate and 
correct bytes in error in any single codeword that exceeds the correction capability of that 
codeword. 

. 5 [0012] U.S. Patent 6,275,965, which is incorporated herein by reference, describes a system 
that corrects error bytes in a two-level code structure. Up to a maximum of ti error bytes in a 
codeword can be located and corrected using 2t { check bytes per codeword. A codeword with 
error bytes exceeding its ti codeword correction capacity is a bursty subblock. Up to a maximum 
of ti + 1 2 error bytes can be located and corrected in as many as B bursty subblocks using B*(2t 2 ) 

1 0 block-level check bytes. 

[0013] A post-encoding process is provided to "de-interleave" or redistribute the computed 
codewords into modified codewords such that the data vectors and their codeword check bytes 
are located in the same codeword. The block-level check bytes are distributed among the 
codewords. 

15 [0014] The techniques described in the '965 patent are useful for correcting error bytes that 
occur in random locations with the codewords. However, the number of error bytes in the 
codewords can exceed the correction capability of the techniques described in the '965 patent. 
These types of errors are referred to as burst errors. Burst errors can occur, for example, as the 
result of a scratch or defect on a magnetic disk. 

20 [0015] Therefore, it would be desirable to provide additional techniques for detecting and 

correcting burst errors occurring in data bytes that are formed in a two-level block code structure. 

BRIEF SUMMARY OF THE INVENTION 
[0016] The present invention provides techniques for detecting and correcting burst errors in 
25 data bytes formed in a two-level block code structure. According to the present invention, a 
second level decoder uses block level check bytes to detect columns in a two-level block code 
structure that contain error bytes. The second level decoder generates erasure pointers that 
identify columns in the two-level block structure that have been effected by burst errors. 
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[0017] A first level decoder then uses codeword check bytes to correct all of the bytes in the 
columns identified by the erasure pointers. The present invention frees up the first level decoder 
to use all of the codeword check bytes only for error byte value calculations. The first level 
decoder does not need to use any of the codeword check bytes for error location calculations, 
5 because the erasure pointers generated by the second level decoder provide all of the necessary 
error locations. The error correction capability of the first level decoder is doubled, because all 
of the codeword check bytes are freed up to be used only for error value calculations. 

[0018] Other objects, features, and advantages of the present invention will become apparent 
upon consideration of the following detailed description and the accompanying drawings, in 
10 which like reference designations represent like features throughout the figures. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0019] Figure 1 A is a block diagram detailing the architecture of a buffered hard disk drive 
controller that includes an on-the-fly error correction code (ECC) system for implementing on- 
1 5 the-fly error correction code. 

[0020] Figure IB is a block diagram of a data storage system depicting data flow along a read 
channel and a write channel of the hard disk drive controller of Figure 1A. 

[0021] Figure 2 illustrates a two-level block code structure according to the present invention. 

[0022] Figure 3 illustrates a first level decoder for detecting and correcting random errors 
20 according to the present invention. 

[0023] Figure 4 illustrates a second level decoder for detecting burst errors according to the 
techniques of the present invention. 

[0024] Figure 5 illustrates a first level decoder for correcting burst errors according to the 
techniques of the present invention. 

25 

DETAILED DESCRIPTION OF THE INVENTION 
[0025] Figures 1 A and IB illustrate an example of a hard disk drive control system for reading 
and writing data onto a magnetic hard disk. The hard disk drive control system of Figures 1 A- 



4 



IB is an example of hard disk drive system that can implement techniques of the present 
invention that are described above. The hard disk drive system of Figures 1 A- IB detect and 
correct errors in the data read from a disk. The hard disk drive system of Figures 1 A- IB can be 
used to implement the error correction techniques of the present invention, which are discussed 
5 in further detail below. 

[0026] Figure 1 A illustrates an exemplary architecture of a buffered hard disk drive controller 
50. Hard disk drive controller 50 is configured to read data from and write data to a magnetic 
hard disk 14. Controller 50 includes an on-the-fly error correction code (ECC) system 100 for 
implementing an on-the-fly error correction code. On-the-fly error correction code system 100 
10 includes an ECC read processor 163 and an ECC write processor 167. 

[0027] Figure IB is a block diagram of the hard disk drive controller 50 of Figure 1A that 
includes an on-the-fly error correction code system 100. When sequences of digital binary data 
are to be written onto the disk 14, they are placed temporarily in a buffer 165 shown in Figure 
1 A and subsequently processed and transduced along a write path or channel (167, 169, and 
15 157). 

[0028] The hard disk drive controller 50 includes a logic drive circuit 105 shown in Figure IB 
that formats data from hard disk assembly 33, for example from 8 bits to 32 bits. A disk 
assembly 33 is comprised of disk 14 and a head stack assembly including a spindle motor. A 
FIFO register 1 10 stores the formatted data and exchanges the same with a sector buffer 120. 
20 The ECC system 100 receives the formatted data from the drive logic circuit 105 and performs 
an error correction coding algorithm. ECC system 100 can also perform the techniques and 
processes of the present invention, which are discussed in detail below. 

[0029] A buffer manager 115 controls data traffic between the ECC system 100, a sector buffer 
(i.e., random access memory) 120, and a microprocessor 125. Another FIFO register 130 stores 
25 data and exchanges the same with the sector buffer 120. A sequence controller 135 is connected 
between the drive logic circuit 105, the microprocessor 125, and a host interface 140, to control 
the sequence operation of the data traffic and various commands across the hard drive controller 
50. The host interface 140 provides an interface between the hard drive controller 50 and a host 
60. 
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[0030] First, a predetermined number of binary data elements, also termed bytes, in a data 
string are moved from the buffer 165 and streamed through an ECC write processor 167. In the 
ECC write processor 167, the data bytes are mapped into codewords drawn from a suitable linear 
block or cyclic code such as a Reed-Solomon code. Next, each codeword is mapped in a write 
5 path signal-shaping unit 169 into a run length limited or other bandpass or spectral-shaping code 
and changed into a time-varying signal. The time-varying signal is applied through an interface 
read/write transducer interface 157 and then to the write element in a magneto resistive (or other 
suitable transducer head) for conversion into magnetic flux patterns. 

[0031] All of the measures starting from the movement of the binary data elements from buffer 
10 165 until the magnetic flux patterns are written on a selected disk track as the rotating disk 14 
passes under the read/write head are synchronous and streamed. For purposes of efficient data 
transfer, the data is de-staged (written out) or staged (read) a codeword at a time. 

[0032] Thus, both the mapping of binary data into Reed-Solomon codewords and the 
conversion to flux producing time-varying signals must be done well within the time interval 
1 5 defining a unit of recording track length moving under the transducer. Typical units of recording 
track length are equal fixed-length byte codewords of 512 bytes. 

[0033] When sequences of magnetic flux patterns are to be read from the disk 14, they are 
processed in a read path or channel (157, 159, 161, and 163) and written into the buffer 165. The 
time-varying signals sensed by a transducer are passed through the read/write transducer 
20 interface 157 to a digital signal extraction unit 159. Here, the signal is detected and a decision is 
made as to whether it should be resolved as a binary 1 or 0. As these l's and 0's stream out of the 
signal extraction unit 159, they are arranged into codewords in the formatting unit 161. 

[0034] Because the read path is evaluating sequences of Reed-Solomon codewords previously 
recorded on the disk 14, absent error or erasure, the codewords should be the same. In order to 
25 test whether that is the case, each codeword is applied to an ECC read processor 163 over a path 
from a formatter 161. 

[0035] Also, the output from the ECC processor 163 is written into buffer 165. The read path 
also operates in a synchronous data-streaming manner such that any detected errors must be 
located and corrected within the codeword well in time for the ECC read processor 163 to 
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receive the next codeword read from the disk track. The buffer 165 and the read and write 
channels may be monitored and controlled by the microprocessor 125 to ensure efficacy where 
patterns of referencing may dictate that a path not be taken down, such as sequential read 
referencing. 

5 [0036] Data bytes are typically stored in a data storage disk in codewords. Subsets of the 
codewords are grouped into long blocks. Each codeword contains first level CRC/ECC bytes, 
and each block includes second level ECCb bytes. 

[0037] A data storage system uses the first level CRC and ECC bytes to locate and to correct 
errors in data bytes within a codeword. When there are too many errors in the data bytes to be 
10 corrected by the first level ECC bytes, the data storage system uses the second level ECC B bytes 
to correct the errors as described in detail above. ECCb bytes are used for correcting data bytes 
in any of the codewords in a block. Miscorrection of an error can happen when both of the first 
level and the second level CRC and ECC bytes are in error. 

[0038] Further details of multiple level, integrated sector format, error correction code, 
15 encoding and decoding processes for data storage or communication devices is discussed in 

further detail in U.S. Patent Application Publication US 2003/0147167 Al, published August 7, 
2003, to Asano et al., which is incorporated by reference herein. 

[0039] Burst errors (also called hard errors) occur when a scratch or defect on a storage 
medium such as a magnetic disk wipes out several adjacent data bytes. Because data bytes are 
20 stored next to each other on a magnetic disk in codewords and blocks of data bytes, burst errors 
wipe out data bytes that are located next to each other within the codewords of a block of data. 

[0040] The present invention provides techniques for detecting and correcting data bytes that 
have been corrupted by burst errors on a storage medium. The techniques of the present 
invention take advantage of the fact that burst errors result in corrupted data bytes that are 
25 located next to each other within codewords of a block. 

[0041] A plurality of data streams are encoded to generate codewords as discussed above and 
in further detail in U.S. Patents 5,946,328 and 6,275,965. The codewords include data bytes and 
check bytes. The codewords include two levels of check bytes. Each codeword Ci, C2, C3, . . . 
includes a first level check byte n, r 2 , r 3 , . . ., respectively, that can be used to correct ti data 
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bytes with errors. One or more of the codewords also include second (block) level check bytes 
r B that can be used to correct ti + 12 data bytes in any of the codewords that have errors. 

[0042] The codewords that contain only the first level check bytes are referred to as the first 
level codewords. The codewords that contain the second level check bytes are referred to as the 
5 second level codewords. 

[0043] In an interleaved format, the first and second level codewords are arranged in a two- 
level block code structure. The block can be expressed such that each codeword Ci, c 2 , C3. . . is a 
row in the block as shown in Figure 2. The first level codewords are nested to form a first code 
CI in the block, and the second level codewords are nested to form a second code C2 in the 
10 block. 

[0044] Each codeword has n total bytes. The number of data bytes in each first level codeword 
is kl . The number of data bytes in each second level codeword is k2. Each of the first level 
codewords has 2ti check bytes, and each of the second level codewords has 2t2 check bytes. B is 
the number of bursty codewords that are correctable, and m is the number of data vectors. 

1 5 [0045] The Hamming distance between two codewords is the number of symbols in which 
they disagree. It can be shown that to correct T byte errors, a coding scheme requires the use of 
codewords with a Hamming distance of at least 2T + 1 between the codewords. The Hamming 
distance of the first code CI is di = 2t\ + 1. The Hamming distance of the second code C2 is d 2 
= 2t 2 + 1, where di <d 2 . 

20 [0046] The block C shown in Figure 2 includes codes CI and C2. Block C is applied to the 
inverse of an integrating matrix V" 1 to redistribute the data vectors, the codeword check bytes, 
and the block-level check bytes, as discussed in further detail with respect to Figures 5 and 6 of 
U.S. Patent 6,275,965. The integrating matrix is shown in Figure 2. 

[0047] A decoder is used to generate the error location and correction values from the 
25 syndromes detected in any of the codewords that form an interleaved block. Before decoding a 
block, the block is first de-multiplexed to separate out the data byte strings, block check bytes, 
and the codeword check bytes so that they can be validity tested. The syndrome generators 
generate syndromes for the codewords in the block. Syndrome generating techniques are well- 
known to those of skill in the art. 



[0048] The syndromes derived from the codewords are applied to a first level Reed-Solomon 
decoder. The first level Reed-Solomon decoder can detect and correct up to xi bytes in error in 
any single codeword. Reed-Solomon decoders are well known to those skilled in the art. 

[0049] Figure 3 illustrates an example of a block 200 that contains four codewords Ci, C2> C3, 
5 and C4. Each codeword includes a data vector m and check bytes (the shaded portion). There are 
four data vectors m in block 200 and B = 1. The codewords Ci are applied to a first level decoder 
201. 

[0050] Decoder 201 can detect and correct random errors occurring in data vectors in the 
codewords. Decoder 201 uses syndromes to detect and correct up to ii bytes in error in any one 
10 of the four codewords. Decoder 201 outputs corrected codewords cV The corrected codewords 
include the corrected data vectors. The number of errors that are corrected x\ is less than half the 
first level Hamming distance (xj < di/2). 

[0051] Another syndrome generator produces any nonzero syndromes based on the second 
(block) level check bytes in the block. These syndromes are modified to remove the effects of 
15 any errors located by the first level decoder. If the updated syndromes are zero, the correction 
computed by the first level decoder is deemed correct. If, however, the updated syndromes are 
nonzero or if the first level decoder detects a failure, then the first level syndromes in the phase 
that failed and the second level block syndromes are applied to a second level Reed-Solomon 
decoder. 

20 [0052] Thus, if the first level decoder 201 is unsuccessful at correcting the errors in one of the 
codewords, the second level Reed-Solomon decoder attempts to correct the bytes in error. A 
second level Reed-Solomon decoder uses the codeword check bytes and the block check bytes to 
correct bytes in error as shown in Figure 4. The second level decoder can correct up to xi bytes 
in error in the sum of all the codewords. 

25 [0053] The two level block structure is such that the sum of all of the interleaves is a codeword 
in the second code C2. In the presence of a burst error, adjacent symbols in a codeword are 
erased. The symbols in the second level codeword, obtained as the sum of all the interleaves, are 
sums of adjacent symbols in the codeword. Thus, error locations computed by the second level 
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decoder are natural erasure locations for all of the interleaves. The present invention takes 
advantage of this fact to locate and correct burst errors. 

[0054] The codewords Ci are applied to a second level decoder 301 shown in Figure 4. Second 
level decoder 301 can locate columns in a two-level block code structure that have been affected 
5 by burst errors. Decoder 301 receives syndromes generated from the codeword check bytes and 
syndromes generated from the block level check bytes. Using these syndromes, decoder 301 can 
detect the columns that contains errors. 

[0055] Second level decoder 301 outputs erasure pointers Ej. The erasure pointers point to 
columns of bytes in the two-level block structure that contain errors. Thus, the second level code 

10 C2 is only used to detect the locations of error bytes in the block. Because burst errors affect 
adjacent bytes, typically an entire column of bytes in a block is wiped out. The second level 
code C2 (that includes the block level check bytes) identifies the columns that have the errors. In 
a burst error situation, it is likely that many or all of the bytes in an effected column contain 
errors. The first level code CI is then used to correct all of the bytes in the columns identified by 

1 5 the erasure pointers. 

[0056] For example, block 300 in Figure 4 contains four codewords Ci, c 2 , c 3 , and c 4 . Erasure 
pointers 311,312, and 313 point to the first, third, and fourth byte columns in block 300. The 
number of errors that can be corrected %i is less than half the second level Hamming distance (12 
< d 2 /2). 

20 [0057] The erasure pointers Ej and the codewords Ci are then inputted into a first level decoder 
401 as shown in Figure 5. First level decoder 401 corrects burst errors in a two level block code 
structure that has been previously processed by second level decoder 301 . Decoder 401 
interprets the erasure pointers generated by second level decoder 301 as locations of the bytes in 
error. Each erasure pointer points to a block column that contains error bytes. The first level 

25 decoder 401 uses the first level codeword check bytes to correct each of the bytes in the columns 
identified by the erasure pointers. The first level decoder 401 outputs corrected codewords c'i. 
If the number of erasure pointers Ej equals pi, then all possible combinations of ij and pi are true 
such as that 2t* + pi < di. 
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[0058] According to the present invention, the second level decoder 301 only uses the block 
level check bytes to detect the columns that contain error bytes. The block level check bytes are 
not used for error correction. The first level decoder 401 then uses the codeword check bytes 
and the erasure pointers to correct all of the bytes in the identified columns. 

5 [0059] This technique frees up the first level decoder to use the codeword check bytes only for 
error value calculations, because the first level decoder does not need to use any of the codeword 
check bytes for error location calculations. The error correction capability of the first level 
decoder doubles, because all of the codeword check bytes are freed up to be used for error byte 
value calculations. Prior art systems used the first level check bytes for error detection and 
10 correction. 

[0060] As discussed above, the prior art techniques described in U.S. Patents 5,946,328 and 
6,275,965 are useful for correcting random errors in data streams organized in a two-level block 
structure. The system may not be able to determine if the errors are random errors or burst 
errors. Therefore, if the prior art techniques fail to correct all of the bytes in error, the techniques 
15 of the present invention can be attempted to correct the bytes in error. If the errors are burst 
errors, the present invention is more likely to be able to correct all of the bytes in error. 

[0061] While the present invention has been described herein with reference to particular 
embodiments thereof, a latitude of modification, various changes, and substitutions are intended 
in the present invention. In some instances, features of the invention can be employed without a 
20 corresponding use of other features, without departing from the scope of the invention as set 

forth. Therefore, many modifications may be made to adapt a particular configuration or method 
disclosed, without departing from the essential scope and spirit of the present invention. It is 
intended that the invention not be limited to the particular embodiment disclosed, but that the 
invention will include all embodiments and equivalents falling within the scope of the claims. 
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